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Abstract: Fundamental processes in living cells are largely 
controlled by macromolecular interactions and among 
them, protein-protein interactions (PPIs) have a critical role 
while their dysregulations can contribute to the pathogen- 
esis of numerous diseases. Although PPIs were considered 
as attractive pharmaceutical targets already some years 
ago, they have been thus far largely unexploited for thera- 
peutic interventions with low molecular weight com- 
pounds. Several limiting factors, from technological hurdles 
to conceptual barriers, are known, which, taken together, 
explain why research in this area has been relatively slow. 
However, this last decade, the scientific community has 
challenged the dogma and became more enthusiastic 



about the modulation of PPIs with small drug-like mole- 
cules. In fact, several success stories were reported both, at 
the preclinical and clinical stages. In this review article, writ- 
ten for the 2014 International Summer School in Chemoin- 
formatics (Strasbourg, France), we discuss in silico tools (es- 
sentially post 2012) and databases that can assist the 
design of low molecular weight PPI modulators (these tools 
can be found at www.vls3d.com). We first introduce the 
field of protein-protein interaction research, discuss key 
challenges and comment recently reported in silico pack- 
ages, protocols and databases dedicated to PPIs. Then, we 
illustrate how in silico methods can be used and combined 
with experimental work to identify PPI modulators. 
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1 Introduction: PPIs, Past and Present 

Proteins are polymers composed of amino acids that gener- 
ally fold into a highly specific tertiary structure. Proteins 
can interact with basically all types of molecules, from 
small organic compounds, inorganic salts and metals, 
sugars, fatty acid, nucleotides, peptides to phospholipids of 
cell membranes and with other proteins. Short overviews 
of some key events in the field of protein science and pro- 
tein-protein interaction (PPI) have been reported recent- 
ly. 11 " 31 We here briefly report some key dates that have con- 
tributed to the field of PPIs (Figure 1). 

The term "protein" seems to have been first mentioned 
in a scientific correspondence on the 10 th of July 1838 be- 
tween two scientists, Berzelius and Mulder (identification of 
a new unknown large molecule and two theories are asso- 
ciated with the term "protein": this substance could be con- 
ceived as a primordial substance and linked with the name 
Proteus, a Greek mythological character, possibly the first 
son of Poseidon, or the term "protein" could be linked to 
Proteus because this ancient god could assume a variety of 
shapes although this fact about protein was not known at 
that time). Around 1920, the interaction between the 
enzyme trypsin and one protein inhibitor was anticipated 
and so was the concept of antibody (Antikdrper) binding 
some other substances (Ehrlich). Important advances also 
came around 1920 with the invention of ultracentrifugation 
(Svedberg, 1927), the realization that proteins could be pu- 
rified and around 1950, it was possible to determine the 
amino acid sequence of insulin (Sanger) and alpha helix 
and beta sheet were pointed out by Pauling and Corey. Ad- 
ditional breakthroughs came from the determination of the 
3D structure of proteins by X-ray crystallography: myoglo- 
bin (Kendrew) and hemoglobin (Perutz, Fersht, Simon, Rob- 
erts), in 1959 and 1960, respectively. Interestingly, the X-ray 
structure of hemoglobin is composed of four subunits non- 
covalently bound (i.e., tetramer, obligate complex, see 
below) and such work laid the groundwork for understand- 
ing quaternary structures (nomenclature of Linderstrom- 



Lang and Schellman, 1959) at the structural level and 
helped in gaining new insights about allostery (developed 
by Monod and collaborators around 1963). At about the 
same time, the concept of DNA and of mRNA for the syn- 
thesis of proteins was demonstrated by Monod, Jacob and 
Lwoff. Another major biophysical approach to investigate 
the 3D structure of proteins and in some cases of protein- 
protein interactions is NMR, first applied to proteins around 
1982 m while around 1978, Wodak and Janin implemented 
the first modeling algorithm for protein-protein docking. 
As our knowledge increased, it was more and more obvi- 
ous that proteins were not acting alone. Clarification about 
networks of interactions yet required the development of 
large-scale tools together with global collective decisions 
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Figure 1. Timeline of Protein Science and PPI research. 



of launching large-scale scientific projects such as the 
Human Genome project (1990) and various structural ge- 
nomics initiatives (launched around 1998-2000). The yeast- 
2-hybrid (Y2H) method reported in 1989' 41 is an example of 
a large-scale approach that greatly facilitated the identifica- 
tion of binary interactions. The first systematic PPI interac- 
tion maps were published in 2000, using Y2H while maps 
resulting from the use of another method, namely affinity 
purification-mass spectroscopy (AP-MS), started to be re- 
ported around 2002. Indeed, the term interactome (coined 
by a group of French scientists headed by B. Jacq) ap- 
peared in the literature in 1999.' 51 During the 1990s, the 
2000s, and up to now, an impressive amount of experimen- 
tal efforts has been dedicated to PPIs, using known ap- 
proaches tuned to PPIs or developed for the direct (mea- 
sure the actual concentrations of the bound and free pro- 
tein forms, eg., gel filtration, ultracentrifugation, etc) or in- 
direct (imply the concentrations from an observed signal, 
e.g., many optical methods like fluorescence-based meth- 
ods) analysis of interaction including PPIs. In addition, sev- 
eral methods were applied to investigate affinity such as 
isothermal titration calorimetry, surface Plasmon resonance, 
and fluorescence-based methods. At the same time, since 
1990 and up to now, realizing that such complex system 
could not be assessed with experimental approaches alone, 
many in silico methods were developed. These approaches 
allow prediction of protein-protein complex by text 
mining, visualization of dynamic PPI networks, assessment 
of the PPI interfaces up to the screening of thousands of 



small molecules and the design of novel compound collec- 
tions dedicated to PPIs (Figure 2). 

Around the year 2000, as a tremendous amount of work 
on PPIs has already been carried out, as it was noticed that 
PPIs were playing a major role in many disease conditions 161 
(e.g., in cancer 171 ) and because new drug targets were 
needed, new projects aiming at identifying low molecular 
weight drug-like compound modulators of PPIs (in addition 
to the traditional ways of acting on PPIs such as with mon- 
oclonal antibodies and other types of proteins and pep- 
tides) got started in several academic and private laborato- 
ries. However, it is important to note that during many 
years up to around 2000-2005, it was essentially consid- 
ered by the scientific community that PPIs could not be 
modulated (inhibitors or stabilizers) by drug-like com- 
pounds. Since then, the situation as changed and remark- 
able efforts are now being made to rationally design PPI 
modulators (see for instance the literature 18 " 291 ). Many data- 
bases and in silico tools that assist drug discovery and 
chemical biology have been developed and most URLs for 
these services can be found at www.vls3d.com.' 301 Of major 
importance for the research teams working on methodo- 
logical developments and applications of in silico tools in 
the areas of Health and Biology, the 2013 Nobel prize in 
Chemistry was awarded to Karplus, Levitt and Warshel (see 
some recent reviews from these scientists' 31 " 331 ). It is indeed 
the first Nobel Prize given to work carried out in the field 
of computational biology and chemistry. 
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Figure 2. In silico tools and PPIs. Main in silico tools that assist the investigation of PPIs and the rational design of PPI modulators. 



The present review will primarily focus on in silico ap- 
proaches (focusing somewhat more on software packages 
and databases reported in 2013-2014) that can assist the 
rational design of "drug-like" orthosteric PPI inhibitors, 
while the readers can find recent reviews about other types 
of molecules able to modulate PPIs such as peptides, mac- 
rocycles and antibodies. 121,34-401 Fragment-based technolo- 
gies are well-suited to target PPIs but will be only briefly 
commented upon here as recent reviews on the topic have 
been reported. 137,41,421 We will discuss some aspects of PPIs, 
from networks to structural analysis of the interface with 
notes on diseases and target selection. Key in silico meth- 
ods that assist the rational design of PPI modulators are 
then introduced with a special emphasis on PPI inhibitors. 
We illustrate PPI hit discovery on two recently investigated 
biological systems, the VEGF-VEGFR complex 1431 and the an- 
ticoagulant activated protein C. m 



2 PPI Studies Combining In Vitro and In Silico 
Approaches: from Network to Structural 
Features and Mutations of the Interfaces, 
a Short Overview 

It is important first to select the right protein-protein com- 
plex among several hundred thousands of known or antici- 
pated interactions. In order to perform this step in a rational 
manner, knowledge about PPI networks can be critical. Yet, 
to gain additional knowledge about the selected com- 
plexes, structural analysis and predictions are usually 
needed. Several of these aspects can be investigated exper- 
imentally but in silico strategies can greatly assist the pro- 
cess. 



2.1 PPI Network 

The explosive growth of PPI data derived from small-scale 
to genome-scale studies implied the development of over 
100 databases and in silico services dedicated to PPIs. 



These many public PPI databases are important because 
they help the scientific community to gain new insights 
about PPIs (i.e., data have to be collected, integrated, cura- 
ted and translated into knowledge). At present, some data- 
bases focus on some specific species and can be very speci- 
alized, others may contain data coming from large-scale 
studies. Overall and at present, many databases contain re- 
dundant information. Some databases contain data about 
"experimentally" identified protein-protein complexes (e.g., 
with Y2H, gene co-expression, split ubiquitin, protein com- 
plementation assays, AP-MS...each method has strengths 
and weakness and there are known artifacts 1451 ) while 
others are built using interactions collected from literature 
searches; some databases contain the 3D structures of pro- 
tein complexes (please see for example 145-471 ). One major 
difficulty with some large-scale data is that PPIs detected 
using the same methods or with different methods by dif- 
ferent research groups but on the same organism can dis- 
play very limited overlap (false positives: detectable interac- 
tions but functionally irrelevant; false negatives: miss inter- 
actions that do occur) calling for major curation efforts to 
filter out unreliable interactions (remove the noise) and 
quantification of errors. These differences also suggest that 
the techniques could be providing complementary descrip- 
tions. 1481 There are in fact many reasons for such discrepan- 
cies including the obvious differences between the experi- 
mental methods used (i.e., some methods can capture tran- 
sient interactions, others are geared towards identification 
of stable interactions, etc), but yet, at present, these differ- 
ences represent a serious concern for the PPI field as down- 
stream analyses of the resulting networks can preclude 
meaningful estimates of the size of the functional interac- 
tome or of the importance of some interactions in a given 
disease condition. Adoption of the Proteomics Standards 
Initiative Molecular Interaction (PSI-MI) format 1491 and the re- 
lated directives of the IMEx consortium should help im- 
prove the quality of the data, 1501 but much work is needed 
in this area. It has been assumed until recently that litera- 
ture curated PPI data were of higher accuracy than those 
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produced by high throughput studies because they were 
derived from focused investigations, but recent analyses 
are suggesting that this is not longer the case. 1511 

Some well-known databases include, the Database of In- 
teracting Proteins (DIP), 1521 the Biomolecular Interaction Net- 
work (BIND), 1531 the Molecular Interaction (MINT)/ 541 the 
Mammalian Protein-Protein Interaction (MIPS), 1551 the host- 
pathogen interaction database (HPIDB),' 561 IntAct,' 571 Bio- 
GRID,' 581 STRING. 1591 Structural information can be found at 
the Protein Data Bank (PDB) and for instance PISite collects 
interface data from the PDB. 1601 Structures of domain- 
domain interactions are available from 3did, [611 and 
iPfam,' 621 while a spatial classification of 3D protein domain 
interaction database, KBDOCK, has recently been report- 
ed. 1631 The database Instruct contains high-quality 3D struc- 
turally resolved protein interactome networks. 1641 Homology 
models can also be used to study further PPIs and increase 
coverage. The lnteractome3D database' 461 provides 12 000 
structurally resolved PPIs in 8 organisms while Instruct con- 
tains over 6500 human PPIs. 1641 Another resource, the Pro- 
tein Interaction and Molecule Search (PRIMOS) platform, 
represents a novel web portal that unifies six primary PPI 
databases (BIND; DIP; HPRD (Human Protein Reference Da- 
tabase); IntAct; MINT and MIPS, Munich Information Center 
for Protein Sequences) into a single consistent repository. 1651 
Along the same line, iRefWeb is a bioinformatics resource 
that offers access to a large collection of data on protein- 
protein interactions in over a thousand organisms. This col- 
lection is consolidated from 14 major public databases. 1511 
Similarly, curated PPI data can also be found via the PSIC- 
QUIC Web service, which provide access on the fly to files 
made available by over 28 source databases. 1561 Also, the 
DASMIweb service currently has access to 36 distributed 
data sources. Ten of these provide interaction data that 
have been experimentally determined or curated from the 
scientific literature, 24 data sources contain computational 
predictions, and 2 data sources can be used for scoring the 
quality of the interactions. 1671 Another example is UniHI 7 
(Unified Human Interactome). The online tool integrates 
about 350000 molecular interactions for more than 30000 
human proteins. Besides protein-protein interactions from 
12 different resources (including HPRD, BioGrid, IntAct, DIP, 
BIND and Reactome databases) as well as four interaction 
maps produced by computational predictions and two 
high-throughput yeast-2-hybrid screens, UniHI 7 also com- 
prises curated transcriptional regulatory interactions from 
three complementary databases TRANSFAC, miRTarBase 
and HTRIdb. In addition to these interactions, the service 
also integrated drug target information from DrugBank 
that can be mapped and visualized online without having 
to download, manually process and load the data into an 
external standalone application. 1681 The data can be filtered 
by the users (e.g., number of PubMed references, small- 
scale or large-scale experiments, direct or indirect connec- 
tion, binary or complex interaction). 



PPI networks can be derived from data collected by the 
above-mentioned methods, namely, methods that probe 
binary interactions (e.g, Y2H), and approaches that detect 
multi-protein complexes (e.g., AP-MS). Both, AP-MS and 
binary detection methods probe non-native protein con- 
structs, where tags are appended to the native polypep- 
tides, potentially altering their properties. 1451 As mentioned 
above, networks can also be built from curated PPI data 
collected from the literature. All these data can be visual- 
ized using, for example, the Cytoscape package. 1691 Such PPI 
networks, comprising in human approximately 130000 to 
650000 protein interactions (only a small subset has been 
fully experimentally identified) 170,711 can shed new light on 
human diseases.' 72,731 Further, monitoring portions of the 
network that change when cellular states and conditions 
are altered could also give new insights about the health 
and disease states. 

Analysis of PPI networks using computational and statis- 
tical tools help to understand how networks mediate geno- 
type to phenotype relationships. If we take as example 
binary interactome maps, structurally, these maps were 
found to have a so-called scale-free topology with hierarch- 
ical modularity. 1741 In networks of this topology, proteins are 
depicted as nodes and interactions as edges and in general, 
only some proteins, so called hubs, have a very large 
number of interaction partners (see the literature' 75,761 for in 
depth discussion of PPI networks and network visualization) 
(Figure 3). This also means that such networks are resilient 




Figure 3. PPI network representation. A simple illustration of part 
of a PPI network with the serine protease thrombin at the center. 



against failure of random nodes (e.g., by mutation) but sen- 
sitive to targeted attack of the hubs. Fascinatingly, in both 
plants and human, proteins of viral, bacterial, and fungal 
pathogens were all found to target such hub proteins.' 21 Es- 
sential proteins tend to be more interconnected than non- 
essential proteins. It would seem that human disease-asso- 
ciated proteins too, are more interconnected than non-dis- 
ease proteins.' 741 

Finally, as whole genome and transcriptome sequencing 
gets cheaper and faster, gene expression profile analyses of 
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normal and pathological conditions should also contribute 
to the initial identification of clinically relevant PPIs that will 
then require further in depth investigation. For instance, 
the pro-survival IAP and BCL-2 proteins represent highly at- 
tractive PPI targets since their over-expression is associated 
with tumor progression and maintenance. Many compara- 
tive gene expression databases have therefore been devel- 
oped, that allow the retrieval, analysis and comparison of 
gene expression patterns within or among species (see for 
instance the literature 177 " 791 ). Although it is not exactly 
known how many PPIs would have a therapeutic potential, 
such picture strongly suggests that the design of small mo- 
lecular weight compounds targeting PPIs could have 
a major impact in the near future. With regard to drug dis- 
covery endeavors, even if at present PPI networks do not 
fully reveal the topologies of networks truly operating in 
the cells because of the limitations mentioned above, such 
analysis can still help to propose a list of potentially inter- 
esting PPI targets that then need to be explored further 
and/or could pinpoint several proteins that would need to 
be targeted at the same time following, for example, the 
concept of rational polypharmacology design. 180 " 831 

2.2 Structural Analysis of Protein Protein Interfaces: from 
Experimental Structures to Point Mutations Involved in the 
Disease State 

By introducing atomic resolution knowledge (e.g., using X- 
ray, NMR, high resolution electron miscroscopy) in PPI net- 
works and by small-scale in depth analysis of protein-pro- 
tein interfaces, new insights can be gained with regard to 
the rational design of PPI modulators. General principles 
about PPIs at the atomic levels have been proposed for in- 
stance by Janin and Chothia in 1990 [841 or by Jones and 
Thornton in 1996. 1851 The range of K d values observed in 
biologically relevant processes that rely on PPIs is wide and 
extends over about 12 orders of magnitude from 1(T 4 to 
10" 16 M (overall, the binding energy AG between proto- 
mers does not appear to be correlated with the size of the 
interface or other interface parameters such as the planarity 
and polarity for most PPIs 1861 ). A key fundamental distinction 
between PPIs is by their duration (that is whether the inter- 
action is permanent or transient and this one can be fur- 
ther divided into weak and strong transient interactions). A 
slightly different definition expresses the duration as well 
as the functional aspect, dividing protein-protein interac- 
tions in terms of obligate (the protomers are not stable on 
their own in vivo) and non-obligate complexes' 1,25,86 " 891 
(Figure 4). 

It is important to note that many PPIs do not fall into dis- 
tinct types, rather, a continuum usually exists. Depending 
on the types of complexes (permanent, transient...), the 
nature of the interface usually differs. For instance, non ob- 
ligate protein-protein complexes have been analyzed and 
the interface size measured by the buried surface area ap- 
proach has a mean value of 1910 A 2 , with an average of 



204 atoms contributing to this region belonging to 57 
amino acids, that is about 28 residues per protein. 131 Analy- 
sis of a novel PPI dataset suggest that the minimum pro- 
tein surface that must be buried to form a functional com- 
plex is in the order of 900 A 2 (about 500 A 2 provided by 
each partner) and involves about 12 residues on each part- 
ner (of course these values can differ slightly depending on 
the datasets and the way the computations are carried 
out). A large majority of atoms in non-obligate interfaces 
are usually still accessible to the solvent. Relative to the ac- 
cessible protein surface, the interfaces of such complexes 
are depleted in Glu, Asp and Lys and enriched in Met, Tyr 
and Trp. The rim made of residues in which none of the in- 
terface atoms are fully buried has a composition close to 
the protein accessible surface. The core comprises buried 
atoms and about 55% of all interface residues. This core 
region is enriched in aromatic residues and to a lesser 
extent, in aliphatic residues (but not Val, Ala and Pro). Arg 
residues can be present in both the core and the rim re- 
gions. Another region was also recently described, the so- 
called support zone that seems similar in composition to 
the protein interior.' 901 By comparisons with other types of 
interfaces, like for instance homodimers, these complexes 
tend to have on average a buried surface area twice that of 
the non-obligate complexes.' 31 The interface is here more 
hydrophobic and tends to be enriched in aliphatic and aro- 
matic residues, on average, by a factor of 2 as compared to 
non-obligate interfaces. Analysis of interfaces can also be 
carried out in term of proteins involved in a given disease, 
and/or in term of hub versus non-hub proteins. For in- 
stance, it was shown that protein-protein complexes and 
hub proteins in cancer have smaller, more planar, less tight- 
ly packed binding sites compared to non-cancer proteins 
(and non-hub proteins), indicating low affinity and high 
specificity of the cancer-related interactions.' 91,921 

Further, within interface regions, in general, not all resi- 
dues are equally important and it is possible to use the 
concept of hotspots (the binding energy is not equally dis- 
tributed among all amino acids participating in the interac- 
tion, some residues are directly responsible for the stabiliza- 
tion of the complex, these residues confer most of the 
binding energy to the interaction, typically they are defined 
as those residues contributing to at least 2kcalmol _1 to 
the total binding energy of the complex).' 931 These hotspots 
(hotspots tend to occur in clusters and can belong to the 
different protein partners, these ones are in contact with 
each other and form a network of interactions often called 
hot regions' 251 ) can be identified experimentally but 
a number of computational approaches can also be 
used.' 941 It should be remembered that hotspot residues are 
not easy to identify experimentally (e.g., alanine-scanning 
experiments) or in silico (see for instance discussions about 
possible misconceptions of alanine-scanning results' 89,951 ). 
Hotspot residues (among the most conserved amino acids) 
are generally located around the center of the interface, 
and are protected from bulk solvent by energetically less 
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Transient 
(weak) 

K d - lO^M 



Transient 
(strong) 

K d ~ 10 9 M 



Permanent 
K d ~ 10" 9 M 





Figure 4. Transient and permanent interactions. A permanent interaction is usually very stable and thus generally exists only in its com- 
plexed form. 186,2691 A transient interaction associates and dissociates in vivo. In an obligate PPI, the protomers are not found as stable struc- 
tures on their own in vivo. Structurally or functionally obligate interactions are usually permanent, whereas non-obligate interactions may 
be transient or permanent (often antibody-antigen and enzyme-inhibitor systems). It is important to note that many PPIs do not fall into 
distinct types. Rather, a continuum exists between non-obligate and obligate interactions. The strong transient category of interactions il- 
lustrates the continuum that exists between the weak and the more permanent interactions. This category includes interactions that are 
triggered/stabilized by an effector molecule or conformational change. Some examples are given here to illustrate these concepts. Obli- 
gate: the Arc repressor dimer (PDB file: 1 ARQ) (the Arc repressor of Salmonella bacteriophage P22 is a dimeric sequence-specific DNA-bind- 
ing protein, one chain is shown in dark and the other is painted in grey); Non-obligate permanent heterodimer: insect derived double 
domain Kazal inhibitor Rhodniin in complex with thrombin (PDB file: 1TBQ) (thrombin (black) is a key serine protease of the coagulation 
system, the inhibitor is painted in grey); transient (weak): red abalone lysin dimer (PDB file: 2lyn) (Abalone sperm uses lysin to make a hole 
in the egg's protective vitelline envelope, one chain is in dark, the other is in grey). 



important residues forming a hydrophobic O-ring. Trypto- 
phan (21%), arginine (13.3%) and tyrosine (12.3%) are 
often hotspot residues whereas leucine, serine, threonine 
and valine tend to be disfavored. 196-981 The surface area of 
a region containing some hotspot residues is around 
600 A 2 , a size that is compatible with a small molecule (NB: 
traditional protein-small ligand interaction -300-1000 A 2 
and the solvent accessible surface of many small molecule 
drugs usually ranges from 150-500 A 2 ), and much smaller 
than a typical protein-protein interface (e.g., 1200 to 2000 
to well over 3000 A 2 ). 131 In addition, molecular dynamics 
studies have shown that hotspots are relatively rigid as 
compared to the surrounding interface residues. 1991 Of im- 
portance also is the recent estimation of the number of 
possible protein interaction types, estimated to be around 
4000 [ioo] By | ook j ng at the structure of the interface area 
and through investigations of the Protein Data Bank, 11011 it 
seems that the interface space is limited and even chains 
with different folds often have similar interfaces. Possibly, 
the interface space is close to complete at present, sug- 
gesting that templates for interfaces are probably available 



in the current version of the Protein Data Bank (about 
100000 protein structures in 2014). [lo:M041 Further, it is im- 
portant to note that many protein complexes seem to be 
dominated by a hot segment where the interaction is do- 
minated by a continuous epitope and as such hot seg- 
ments could be good predictors of PPI druggability. 1341 Still 
along this line of attempting to predict PPI druggability in 
term of a region capable of binding a small molecule, a re- 
cently reported study attempts to define classes of PPIs 
that could be more easily modulated by low molecular 
weight compounds and suggests that the "tight and 
narrow" and "weak and narrow" protein-protein complex 
categories are good candidates. 11051 PPIs can also be classi- 
fied from a secondary structure-centric approach but the 
links with druggability of the interface are not fully under- 
stood at present, 11061 yet, the interactions involving one 
helix with a binding groove might be easier to modulate 
with a small compound than other types of interfaces. 11071 

Additional information about the importance of a pro- 
tein-protein complex and about interfaces could come 
from non-synonymous single nucleotide polymorphism 
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(nsSNP) studies (i.e., single base changes leading to 
a change to the amino acid sequence of the encoded pro- 
tein) data because many of these variants are associated 
with disease. Clearly, the development of affordable tech- 
niques for sequencing genomes and the application of 
these approaches will generate vast amount of new data, 
including SNPs. Thus far, studies looking at the effects of 
nsSNPs were performed on individual proteins but now, 
the impact of nsSNPs on protein-protein interactions starts 
to be investigated. 1108 1091 It seems that when a disease-caus- 
ing nsSNPs do not occur in a protein core region, they are 
preferentially located at a protein-protein interface rather 
than on non-interface regions.' 1101 These studies could help 
find rules that assist the selection of a target. Along this 
line, the manually curated SKEMPI database has been de- 
veloped and contains the effects of mutation on binding 
energies for about 2792 mutations across 85 protein-pro- 
tein complexes. 11111 New insights should definitively come 
from the analysis of such repository. 

2.3 Protein Protein Interfaces and Zones that Could be 
Drugged 

Many different in silico tools can be used to probe PPIs at 
a structural level. There are tools that predict hotspot resi- 
dues and methods that suggest regions of protein most 
likely to be in an interface region. Other algorithms attempt 
to predict the structure of a protein-protein complex, 
either by docking (guided or not by in silico prediction 
methods of protein-protein interface regions) or by com- 
parative modeling. As interfaces can be flexible, simulation 
tools such as molecular dynamics and normal mode analy- 
sis are of major importance. Further, as small ligands tend 
to bind in cavities, tools to predict binding pockets, to pre- 
dict the druggability of a binding pocket and simulation 
tools (that can unravel transient binding pockets) are also 
needed. Further, it can be of interest to compare and clus- 
ter PPI interfaces to facilitate the design of a compound 
that could bind to several protein complexes or to gain 
knowledge about druggable PPIs. Several of these methods 
will be briefly presented below and it is important to note 
that some tools can be used for several purposes, for in- 
stance, predict interface residues and hotspots or define 
most likely binding pockets for a small compound and hot- 
spots. 

2.3. 1 In Silico Predictions of Hotspots and Residues Present at 
the Protein Protein Interfaces 

Diverse protein-protein binding site prediction methods 
have been reported (see discussions about these tools 
in C97 ' 112] ) (Figure 5), mostly based on sequence conservation, 
residue propensities, surface topology (planarity and pro- 
trusion), electrostatics, hydrophobicity and solvent accessi- 
bjHty [16,21,23,97,112-114] 5 0me protein-protein binding site pre- 
diction approaches are based on the protein sequence 



PPI interfaces 



Hotspots 



Predict interface residues 



ISIS, FOLDEF, ROB ETTA, 
pyDockNIP, MAPPIS, HotPoint, 
KFC2... 



ISIS, TreeDet, Promate, Pinup, 
InterProSurf, PRISM, Consurf, ET, 
WHISCY, PIER, 121-SiteEngines, 
PPI-Pred, Cons-PPISP, SPPIDER, 
Patch Finder plus, Meta-PPISP, 
MetaPPI Pi2PE, SHARP2, ODA, 
pyDockNIP, iPRED... 



Predict the 3D 
structure of a complex 



Protein Docking 

(online) 



Template-based 
docking 

(online) 



ClusPro, GRAMM-X, ZDock, 
3D-Garden, Hex server, Patch 
Dock, Haddock, Rosetta Dock, 

FiberDock, FireDock, 
pyDockWeb, KBDock, ZDOCK 
server, DOCK/PIE(RR)... 



lnteractome3D, InterPreTS, 
SPRING, COTH, TACOS, 
PrePPI, Coev2Net, 
Struct2Net, iWrap, PRISM, 
HOMCOS... 



Figure 5. In silico tools to study protein interfaces, a) Examples of 
software packages that predict interface residues and hotspots. b) 
Some tools to predict the 3D structure of protein-protein com- 
plexes. 

alone like the ISIS (interaction sites identified from se- 
quence, neural network approach) approach 11151 and PPI- 
cons (even if the tool used some structural information 
during training) 11161 or SPPIDER (it runs with or without in- 
formation about the 3D structure). 11171 It has however been 
noted that methods that use structural information tend to 
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be more accurate than sequence-based approaches (see 
for instance the literature 1461 ). Other tools aiming at predict- 
ing interface regions require the structure of the protein(s). 
Some methods use empirical scoring functions like ODA 
(see for review the literature 1971 ), other approaches use se- 
quence conservation among other parameters like Pro- 
mate, 11181 while others make use of machine learning tech- 
niques like SPPIDER. 11171 Meta-servers, combining different 
tools have also been developed such as meta-PPISP. 11191 Pro- 
tein interface can also be probed by docking. 11131 

Other tools developed to predict specifically hotspots 
(the experimental approach commonly used is alanine- 
scanning) require in general the 3D structure of the protein 
complex and can use, for instance, an empirical scoring 
function to assess the interface (e.g., HotPoint 11201 ) while 
others are energy-based like ROBETTA (or ROSETTA), [121] or 
FoldX, [122] or iPred. 1231 Other methods for hotspot prediction 
use the unbound protein structures of each partner of 
a complex and docking e.g., the module pyDockNIP of the 
pyDock software package' 97 1241 ). Hotspots can also be inves- 
tigated by molecular dynamics in water and in isopropa- 
nol/water cosolvent environment (see for instance the liter- 
ature 11251 ). Very few methods based on the sequence alone 
have been reported to predict hotspots, yet the method 
ISIS mentioned above has been also applied to predict hot- 
spot residues. 

2.3.2 Protein Protein Docking and Template-Based (and 
Threading) Structural Predictions of Protein Complexes 

A major difficulty in the field of PPI modulation by small 
molecules has been the lack of structural knowledge about 
the individual proteins forming the complex or about the 
macromolecular complex itself and the fact that some PPIs 
involve at least one partner (or one region) that is intrinsi- 
cally disordered. 1126 1271 At present, about 100000 experi- 
mental structures are reported at the PDB (the % of pro- 
tein-protein complex is still very low) and it is possible to 
build homology models for numerous individual proteins 
while the second generation structural genomics initiatives 
together with advances in in silico protein-protein interac- 
tion predictions should improve the situation with regard 
to getting structural information about protein complexes 
in the coming y ear s. [46 ' 112 ' 113 ' 1281 Protein-protein docking ap- 
proaches and template-based structure modeling of PPI 
tools can indeed be used to propose reliable (at least some 
possible solutions that will need to be validated experimen- 
tally) models of the complex 1129 " 1311 (Figure 5). Yet, because 
of the complexity of the problem, these tools usually bene- 
fit from the knowledge of predicted interacting residues 
(e.g., such as to perform docking under restraints), site di- 
rected mutagenesis and other experimental information 
such as SAXS or electron microscopy. Many protein-protein 
docking engines have been reviewed like for instance 



in 



[23,128] 



while some new protein-protein docking tools re- 
leased (or optimized) in 2013 include DockTrina (for dock- 



ing triangular protein trimers), 11321 ATTRACT/ 1331 MEGA- 
DOCK, 11341 pyDockWEB, [135] F(2)Dock 2.0 and GB-rerank, [1361 
and SwarmDock (incorporating flexibility). 11371 These ap- 
proaches can also benefit from new scoring functions as il- 
lustrated by the combination of DockRank and the pro- 
tein-protein docking tool ClusPro. 11381 The other approach 
to build a protein complex is to use template-based model- 
ing which constructs a complex by copying and refining 
the structural framework of related protein-protein com- 
plexes known experimentally. A list of in silico methods has 
been recently reported by the literature 11301311 and include 
for instance TACOS (Template-based Assembly of Complex 
Structures) 11391 or the Struct2Net server. 11401 



2.3.3 Binding Pocket Prediction for Small Molecules and 
Hotspots, Druggability and Clustering of PPIs 

Most therapeutic targets (e.g., enzymes, GPCR, ion chan- 
nels) usually display a clear concave binding pocket that 
can bind a small molecule. While having at hand the 3D 
structure of a protein-protein complex is very useful, it is 
still possible to design small PPI modulators even if one 
has only the 3D structure of one partner (experimental or 
homology model) of the complex. Several tools have been 
developed to predict binding pockets and to access the 
druggability (here defined as the likelihood of finding high- 
affinity low molecular weight binders (i.e., also called li- 
gandability 1871 first coined by Edfeldt et al., 11411 yet the term 
bindability can be used 11421 ) of these pockets. The tools 
were essentially developed for regular targets but such 
methods can still be applied (with cautions) on PPIs. In 
general, PPIs have not evolved to bind a low molecular 
weight chemical compound; interfaces tend to be flat, rela- 
tively large, often lacking a clear ligand-binding cavity but 
protein-protein interfaces that bind small molecules are 
often found to possess regions with 3 to 5 subpockets (see 
below) 11431 and it has also been found that binding pockets 
may not be directly at the interface but within 6 A of the 
interface 11021 (of interest we also found that small ligand 
binding pockets can be found near the amino acids of 
a protein domain interacting transiently with a membrane 
surface, 11441 could small ligand binding pockets be present 
next to most macromolecular interfaces ?). Also, protein- 
protein interfaces tend to dynamically adapt to upcoming 
ligands (small- or large macro-molecules), and transient 
cavities not visible in some experimental structures can 
appear on the molecular surface during (or prior to) the 
binding event. 11251 In such cases, while the flexibility at the 
interface poses a significant challenge for structure-based 
drug design approaches, molecular simulation tools can 
assist and complement X-ray or NMR studies. 1145 " 1481 

Binding pocket detection algorithms are essentially sub- 
divided into two major classes, geometry-based and 
energy-based tools. 11491 In addition to predict binding pock- 
ets, some tools also provide a druggability score, that is, 
they give a score and rank the pockets for their likelihood 
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to bind a low molecular weight drug-like compounds 
(which can be different from reporting a list of cavities). In 
general, methods work on a static protein structure but 
some also take into account protein flexibility. Several 
recent articles or reviews describe these tools and the un- 
derlying concepts.' 112149 " 1561 Of importance is the observa- 
tion that protein-ligand binding hotspots in PPIs seems to 
correlate with protein-protein hotspots. 1881 

Because, the identification of small molecular binding 
sites within or nearby protein-protein interfaces can be dif- 
ficult with conventional methods, other tools geared 
toward PPIs were developed. For example, methods based 
on probing a protein surface (that can be tuned for PPIs) 
with organic fragments and predicting the locations of 
likely binding regions based on where fragments interact 
with high affinity were developed. Tools like FTMAP' 1571 
(computational mapping with 16 different chemical probes) 
and the FTFLEX extension of FTMAP (which takes into ac- 
count side chain flexibility on the fly) 11581 can be used to 
pinpoint a region that can be explored by structure-based 
virtual screening approaches and for hotspot prediction 
while if multiple structures are available (or obtained via 
molecular dynamics), FTProd 11591 could be applied. In fact, it 
has been shown that when a main hotspot region at a pro- 
tein-protein interface has a concave topology, with one or 
two additional hot-spots close enough to be reached from 
the first main hotspot site by a drug-sized molecule, then 
the region is likely to be druggable.' 95 1571 

Related to hotspots and about tools that could help 
target PPIs with small organic molecules is the concept of 
'anchor' sites, which contrary to hotspots have explicit con- 
cave/convex geometries appealing for pharmaceutical in- 
tervention (i.e., anchors can also be hotspots). The 
ANCHOR tool was developed along this concept to assist 
the design of small molecule modulators of PPIs. 11601 Anoth- 
er related concept is the notion of druggable interface. The 
2P2I scoring function has been specifically designed to in- 
vestigate interfaces and suggests interfaces that could be 
drugged. 11611 

Another tool dedicated to PPI identifies and ranks clus- 
ters of interface residues in a PPI that are most suitable as 
starting points for rational small-molecule design. These 
clusters are called Small-Molecule Inhibitor Starting Points 
(SMISPs) and the approach is complementary to methods 
that identify binding sites through an analysis of the recep- 
tor surface (either through shape descriptors or chemical 
probes). The PocketQuery web service has been developed 
around this concept to predict hotspots, anchor residues 
and hot regions. 11621 In the same study, the authors expect 
after a PDB-wide analysis, that about 48% of the protein 
complexes could be modulated with a low molecular 
weight molecule. A related concept involves the investiga- 
tion of overlaps between small-molecules and protein bind- 
ing sites within families of protein structures (i.e., bi-func- 
tional sites, so far about 8000 proteins from the human 
proteome have been annotated with bi-functional resi- 



dues 11631 ). Davis etal. [164] reported the HOMOLOBIND soft- 
ware, 11631 a tool that identifies residues in protein sequences 
with significant similarity to structurally characterized bind- 
ing sites. 

These tools tend to work on a static structure (although 
one can generate alternative conformations prior to the 
computations) while some others combine identification of 
hotspots by MM-PBSA free energy decomposition on the 
basis of the structural ensemble generated by molecular 
dynamics (MD) and generation of transient pockets using 
molecular dynamics and FRODA simulations. 11651 

Tools to compare traditional binding pockets have been 
developed 11491 and some examples of recently reported ap- 
proaches include PocketAnnotate,' 1661 APoc [167] and Site- 
Comp.' 1681 All these pockets have been stored in databases 
like for instance the pocketome. 11691 Somewhat related, in 
a recent study, interfaces were defined and clustered lead- 
ing to the identification of 22604 unique interface struc- 
tures in the PDB. [17C1 



3 PPIs are Challenging but Should be 
Tractable Molecular Targets: Supports from In 
Silico Methods 

Despite their therapeutic relevance, most small molecule 
drugs do not in general hit PPIs but rather enzymes, ion 
channels, nuclear hormone receptors and G-protein cou- 
pled receptors. In fact, these last 50 years, PPIs have been 
essentially modulated by therapeutic antibodies, therapeu- 
tic proteins and peptides (or modified peptides or more re- 
cently stapled peptides). 1713 ' 20351061711 However, while bio- 
logics can possess outstanding qualities and be valuable in 
some pathological conditions/ 381 , some of these molecules 
tend to be problematic for at least three reasons: 140 1721 (a) 
most of them are difficult or impossible to administrate 
orally with our present knowledge and can be unstable, (b) 
adverse immune reactions can occur, 11731 and (c) biologies 
are usually expensive to develop, and/or produce, and/or 
store with treatment for one patient easily reaching over 
$100000 per year 11741 (a cost that most healthcare systems 
are not able to afford, and the associated problem of align- 
ing the cost of small chemical compounds to the cost of bi- 
ologies; the cost of drugs is a very controversial issue and 
it should be mentioned here that now, small chemical com- 
pounds can also be extremely expensive such as the re- 
cently approved prodrug Sofosbuvir for the treatment of 
hepatitis C infection, thus the debate about cost is far from 
being closed). Although significant advances have been 
made and will take place in the coming years,' 1751 several 
obstacles will have to be overcome, from cost to delivery 
issues. 1211 It is here interesting to note that small-molecules 
and biologies can be combined (e.g., a small molecule can 
be given with a monoclonal antibody (mAb), or the graft- 
ing of a small molecule to a protein including mAb can be 
valuable in some cases). Along the same line, small-mole- 
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cules could be given together with biologies, not to gain 
a synergistic effect but rather to allow proper functioning 
of the biologies like for instance to avoid aggregation of 
a monoclonal antibody by using a small molecule PPI inhib- 
itor (e.g., proof of concept study with the mAb bevacizu- 
mab or Avastin). 11761 

There are many reasons that have led to the widespread 
perception that modulation of PPIs could not be addressed 
by small drug-like molecules, as mentioned earlier in this 
review: PPI interfaces are often considered as large, flat, 
lacking a well-defined ligand-binding cavity characteristic 
of conventional targets' 12,22,23,87,1771 (these obviously do not 
apply to all interfaces); the lack of appropriate high- 
throughput screening (HTS) technologies and/or the very 
low hit rates observed upon running HTS experiments 126,281 
(in our hand, several PPI targets were screened experimen- 
tally with a "traditional compound collection" and the hit 
rates were around 0.01 % compared to 0.1 to about 5% for 
the screening of a traditional target, yet it can be higher, 
see some examples in the recent review of Nero et al. [1071 ). 
Another type of problem concerns the compound collec- 
tions used to screen PPIs. Indeed, the paucity of small-mol- 
ecules in traditional screening libraries dedicated to PPIs re- 
inforced by the observations that small PPI hit molecules 
often have physicochemical characteristics slightly outside 
of what would be expected for a typical oral drug starting 
point suggest that novel screening collections have to be 
designed for pp| s .ns^wi28,iw79i TnuS( many jn sj | ico meth . 

ods have been developed to assist the design of novel 
compound collections. 

If we take as example PPI inhibitors, there are 3 main 
ways a small compound can block an interaction, the direct 
orthosteric inhibition where the compound binds at sites 
that overlap with the area of the protein that interacts with 
the other protein of a complex, allosteric inhibition where 
the small compound binds away from the interface area (it 
can be close to the interface or at a very different site) and 
interfacial inhibition, where the ligand binds to a transient 
pocket appearing for example during a conformational 
change and locks the protein complex in a nonproductive 
conformation (the targeted complex is a transient kinetic 
intermediate that is characterized by unbalanced energetic 
and structural conditions that create the binding site for 
the drug). 1180,1811 All approaches have pros and cons and 
have to be considered, but definitively, some types of allo- 
steric inhibitions and in general interfacial inhibitions are 
difficult to predict in silico n*w«««.i«i 

3.1 PPI Compound Collections and ADMET 

The earliest efforts to develop small protein-protein modu- 
lators were based on the mimicry of secondary structure el- 
ements of the interacting partners and thus at compounds 
able to mimic beta-turns, alpha helices and beta 
strands." 3,201 Such approaches are still valuable today. 11711 
Indeed, the greatest successes for HTS have been with PPIs 



in which a helix of one protein binds into a groove of the 
interacting partner (e.g., the Bel family) 11071 illustrating the 
potential of compounds mimicking such secondary struc- 
ture elements. As low molecular weight molecules address- 
ing PPI were identified and collected, it became possible to 
characterize the key properties and structural features of 
these compounds (a principal component analysis of chem- 
ical vendor collections versus PPI inhibitors and allosteric 
inhibitors is reported in Figure 6, it should be mentioned 
that at least 4 chemical vendors now provide collections 
dedicated to PPIs, Asinex, Chemdiv, Life Chemicals and 
Otava, but in our hands, a preliminary PCA analysis indi- 
cates relatively similar trends even with these specialized 
collections). At present, at least 3 databases are dedicated 
to modulators of PPIs, the 2P2ldb (manually curated), 11611 
TIMBAL, 11841 and iPPI-DB (manually curated). 11851 Exploring 
and navigating these collections should help gaining in- 
sights into privileged scaffolds or substructures particularly 
well suited to bind at the PPI interface as well as required 
physicochemical thresholds and could help to derive new 
rules to design ADMET-friendly collections dedicated to 
p pi s _[i2,i5,i8,22,i77-i79,i84,i86] j w0 rese arch groups provide in 

silico filters that help to design PPI-focused libraries en- 
riched in PPI inhibitors starting from large traditional com- 
pound collections. A decision tree approach was used by 
Reynes et a 1. 11871 (PPI-HitProfiler which is now available 
online via the ADME-Tox filtering tool FAF-Drugs2, [18S1 ) while 
support vector machines were used by Hamon et al. [1891 
(2P2I HUNTER ). Two studies reporting the rational design of 
compound collections dedicated to PPI that contain alpha 
helical binding epitopes integrating the concept of increas- 
ing the three-dimensionality of the compounds have been 
reported recently. 127,1711 Along the same line but combining 
5 physicochemical properties, a filter to select potential al- 
losteric inhibitors was published together with an online 
server. 11901 

With regard to the main physicochemical properties of 
known PPI inhibitors and potential ADME-Tox problems, in- 
vestigation of known PPI binders showed that the mole- 
cules tend to have a higher molecular weight (average MW 
of 421 Da for protein-protein inhibitors versus 341 Da for 
regular drugs), higher log P (a mean value of -5.1 for pro- 
tein-protein inhibitors was found while it is around 3.5 for 
enzyme inhibitors) and a more complex three-dimensional 
structure than typical drugs, underlining further the need 
of rationally designing the screening collection. Yet, this 
general view does not apply to all PPI modula- 
tors. 118,22,177,178,185,1911 For example, many compounds that are 
known to inhibit PPIs (because of some known physico- 
chemical properties e.g., high lipophilicity) tend to violate 
several rules of thumb commonly used to select com- 
pounds after screening, or to prepare compound collec- 
tions or to predict bioavailability or toxicity. 11 28,1 92-1 951 Such 
rules can be, the Lipinski rule of five (initially related to oral 
administration) 11961 or the 3/75 rule (related to in vivo toxici- 
ty) which states that compounds with high lipophilicity 
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Figure 6. Principal Component Analysis (PCA) on a commercial and allosteric databases and a database of inhibitors of protein-protein in- 
teractions. A chemical vendor database, an allosteric database (version 201 3) 12701 and the protein-protein interaction inhibitor iPPI-DB (ver- 
sion 201 3) [1851 were used in this analysis. Redundant compounds, amino acids, salts, compounds with less than 10 or more than 140 heavy 
atoms, compounds with a molecular weight less than 150 g/mol, or more than 1000 g/mol, compounds with a logP less than -10 or more 
than 10, carbocation, compounds with phosphorus, and non-organic compounds were removed with PipelinePilot v.8.5. A diversity criteri- 
on was applied on our dataset using Bemis and Murcko assemblies on PipelinePilot v.8.5. The dataset contained 421 522 compounds for 
the commercial database, 5141 allosteric compounds and 685 protein-protein inhibitor compounds. The PCA calculations were run using 
18 physico-chemical properties, (a) PCA of commercial database, allosteric database and iPPI-DB: individual map of compounds, PPI inhibi- 
tors are represented as color dots, commercial compounds are represented as black dots and allosteric compounds are represented in 
grey, (b) PCA of commercial database, allosteric database and iPPI-DB: variable map of axes 1 and 2. (c) PCA of commercial database, allo- 
steric database and iPPI-DB: variable map of axes 1 and 3. (d) IPPI family colors used for the PCA. The three first axes of the PCA represent 
62% of the total variance. The first axis is represented by the compound's aromaticity (number of benzene-like rings and number of sp2 
carbon) and the compound's complexity (number of sp3 carbon and Csp3 Ratio). The second axis is characterized by the number of hydro- 
gen bond donors/acceptors. The third axis is characterized by the compound's polarity (here evaluated by TPSA, topological surface area). 
The global position of the protein-protein interaction inhibitors population is the upper part of the individual map showing that these 
compounds seems to be more hydrophobic and more aromatic. According to the global position of the commercial and the allosteric 
compounds, the two dataset seems to share the same chemical space, even if, a small number of allosteric compounds presents an unusu- 
al profile. This tendency was confirmed on 5 similar analyses with different commercial datasets from different chemical vendors. 



(computed log P>3) and low topological polar surface 
area {TPSA < 75) can have an increased risk of generalized 
toxicities (about 6 times more likely to be toxic in short- 
term animal studies) 11971 In fact, because of some of these 
physicochemical properties, some PPI modulators may fit 
the so-called class II (low solubility, high permeability) or 



class IV (low solubility, low permeability) category of the Bi- 
opharmaceutics Classification System (see for review the lit- 
erature 11981 ). Along the physicochemical properties line of 
reasoning and rules of thumb, a GSK team showed that in- 
creasing lipophilicity usually contributes to lower drug effi- 
ciency and consequently such molecules tend to require 
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higher doses which in turn can increase the risk of adverse 
drug reactions (e.g., increased promiscuity leading to in- 
crease binding to anti-targets). 11 " 1 Further, knowing that 
starting hits usually have to grow in size during the com- 
pound optimization phase with in general a further in- 
crease in log P (in fact compounds that have more chance 
to succeed tend to preserve a relatively low lipophilicity 
during the optimization program), 12001 a clever design of PPI 
screening collection is required such as to obtain com- 
pounds with balanced ADME-Tox properties still compatible 
with the binding site features present at or near the pro- 
tein-protein interfaces. Other difficulties could occur if 
a PPI compound has to hit a CNS target since, in general, 
the CNS physicochemical property ranges for molecular 
weight are around 300 and for log P, they are around 
2.8. [2ml However, although the ADME-Tox properties of PPI 
modulators are a legitimate concern, we have recently re- 
viewed several PPI interfaces that can be modulated by hits 
that match the generally accepted rule of five-like guide- 
lines. 118,1851 For the remaining molecules, it is true that some 
PPI modulators have a higher log P (but many have an ap- 
propriate log P to start drug discovery) and molecular 
weight than many drug candidates that hit traditional tar- 
gets, yet for the PPI modulators that have reached clinical 
trials, several seem to be orally available. 12021 In addition, 
many studies highlight that some of these physical chemis- 
try rules might be too restrictive (e.g., 1203,2041 ) as, for in- 
stance, larger compounds could reduce their effective size 
and lipophilicity through hydrophobic collapse or by form- 
ing internal hydrogen bonds, thereby enhancing mem- 
brane permeability and possibly impacting the overall bio- 
availability. 1204-2071 Furthermore, it has been stated that small 
PPI inhibitors tend to use more aromatic interactions than 
the corresponding protein partners that utilize also several 
charged residues, suggesting that new PPI modulators 
could possess more charged groups (this could improve 
some ADME-Tox properties while deteriorate others like 
permeability, but this information is important to ex- 
plore). 12081 Tremendous amount of work is needed in this 
area while inspiration from natural compounds could help 
understanding how to rationally go beyond several ADME- 
Tox rules of thumb. 1209 ' 2101 Overall, it is likely that such obsta- 
cles will be overcome in the coming years. 1181831 In silico ap- 
proaches that make use of multi-parameter optimization 
protocols should indeed facilitate the design of molecules 
with balanced ADME-Tox properties, adequate potency and 
relevant selectivity tuned to the disease type. 12111 Definitive- 
ly, gain in knowledge will come from the analysis of com- 
pounds that are in advanced preclinical stages and in clini- 
cal trials. At present only around 30-50 compounds 11851 are 
at these stages but the increase research activity in this 
area should rapidly bring new insights that will favor ration- 
al and quality by design approaches. 



3.2 Virtual Screening of PPIs 

PPIs can be probed using HTS or virtual screening experi- 
ments followed by in vitro assays of a small selected list of 
molecules resulting from these computations. The term 
"virtual screening" (or in silico screening) was first reported 
in the scientific literature in 1997; 12121 it can be defined as 
a set of computer methods that analyzes large databases 
or collections of compounds in order to identify and priori- 
tize likely hit candidates. 1128,213 " 2231 In silico screening search 
can be performed on libraries that contain physically exist- 
ing compounds, on PPI enriched focused collections or on 
virtual libraries, and thus on compounds that are not yet 
synthesized. Noteworthy is the fact that virtual screening 
can be used on very large databases that no experimental 
approaches can tackle. It is important to remember that 
the easily accessible drug-like space contains about 10 33 
molecules 12241 and that with 17 atoms and simple chemistry 
rules, it is already possible to generate 166 billion com- 
pounds. 12251 Yet, it should be noted that in silico screening 
goes much beyond number crunching, it helps to generate 
ideas, to reduce the cost and to gain knowledge. In silico 
screening experiments can be performed to complement 
HTS (and are indeed often integrated in screening cam- 
paigns), prior to experimental screening, or after HTS to 
rescue some compounds potentially missed by the in vitro 
readouts (see latent hits by [226] ). 12271 The complementarity 
between HTS and virtual screening has been shown in 
many studies, like for instance by screening both in silico 
and experimentally the same 198000-compound collection 
against cruzain, a cysteine protease target for Chagas dis- 
ease. 12281 Along the same line, a computer screening experi- 
ment performed on a subset of the ChemBridge compound 
collection (about 500000 molecules) and a study making 
use of HTS (50000 molecules using also molecules from 
ChemBridge) found quasi-identical hit molecules for the 
proteasome cancer target. 1229,2301 

Virtual screening approaches have been traditionally sub- 
divided into two main methods 1231 2361 (Figure 7): first, 
ligand-based screening, in which 2D or 3D chemical struc- 
tures or molecular descriptors of known actives (and some- 
times inactive molecules) are used to retrieve other com- 
pounds of interest from a database using some types of 
similarity measure or by seeking a common substructure or 
pharmacophore between the query molecule and the com- 
pounds in the database; and second, structure-based (or 
3D receptor-based) screening in which compounds from 
the database are docked into a binding site (or over the 
entire surface) and are ranked using one or several scoring 
functions. Structure-based virtual screening also includes 
tools to perform binding site-derived pharmacophore 
search. There are some slight differences on how the meth- 
ods are classified but the nomenclature used here is gener- 
ally well accepted. 12171 Of importance for PPIs is that struc- 
ture-based screening can be carried out on homology 
models or on low-resolution structures. 1237 " 2391 



Wiley Online Library 



© 2014 The Authors. Published by Wiley-VCH Verlag GmbH &Co. KGaA, Weinheim 



Mol. Inf. 2014, 33, 414 - 437 4 27 



Review 



www.molinf.com 



informatics 



Ligands 



I 




Virtual screening 





gand-based screening 





•Pharmacophore 
•QSAR 
■Network pharmacology 
•Proteochemometrics 



Descriptor-based : 
e.g., properties, binary 
(structural keys, fingerprints] 



Graph-based : e.g., maximum 
common subgraphs 



Pharmacophore-based 



■ 
1 

1 



1 



Structure-based screening 



Docking : 
Stochastic 
Fragment-based - incremental 
Shape-based 




Scoring : 
Force-field based 
Knowledge-based 
Empirical 
Consensus 
Target-based 
Fingerprint 
MM-PB(GB)SA, LIE 
QM/MM, QM 



Figure 7. Main virtual screening methods. The two main virtual screening methods are ligand-based methods and structure-based meth- 
ods. Some approaches can be considered to be at the interface between the two main screening concepts, such as some types of pharma- 
cophore modeling that use information derived from co-crystallized target-ligand complexes or in the case of proteochemometric model- 
ing, QSAR and systems pharmacology. Abbreviations: LIE: linear energy interaction; MM-PB(GB)SA, molecular mechanics-Poisson Boltzmann 
(Generalized Born) Solvent accessibility; QM/MM, quantum mechanics/molecular mechanics; QM, molecular mechanics. Additional informa- 
tion can be found in some recent reviews about virtual screening, 12171 fragment-based approaches, 12711 structure-based tools for screening 
and compound optimization, 12721 or systems pharmacology, 11281 or pharmacophore. 12451 



The structure-based virtual screening process can then 
be continued if deemed appropriate using different types 
of post-processing approaches (see for instance the litera- 
^[214,215,236,240-242]) Ligand . and structure-based methods 

can be combined if the necessary information is avail- 
able. 12431 Virtual screening methods are relatively well-estab- 
lished, and numerous success stories in terms of hit identifi- 
cation, contribution to the development of drug candi- 
dates, or marketed products have been recently re- 
viewed. 1233,2441 This does not mean that the methods have 
no flaws but yet they contribute significantly to the identifi- 
cation of interesting molecules. 12311 Over 100 commercial 
and free tools are available to carry out virtual screening, 
many of these approaches have been discussed in several 
recent reviews. 130 ' 128 ' 232 ' 242 ' 245 ' 2461 

A compound collection is required to perform virtual 
screening and its preparation is as mentioned above critical 
in the case of PPI screening.' 22 1781 Physicochemical proper- 
ties, structural alerts and flags for promiscuity should in 
general be considered. This is also important because mol- 
ecules have to be optimized 12341 and as it has been noticed 
that artifact compounds (e.g., PAINS, pan assay interference 



compounds) are reported at a growing rate 1128,247,2481 (warn- 
ing, some authors do not find that some PAINS molecules 
are that problematic 12491 ). In silico tools such as the FAF- 
Drugs online server 11881 can assist in the preparation of 
a compound collection and, for instance, evaluate physico- 
chemical properties, search for the presence of PAINS and 
toxicophores as well as assess the potential of a compound 
(the molecule has to be in 3D) to be a protein-protein in- 
teraction inhibitor according to the rules defined in the lit- 
erature. 11871 It is important to note that when searching for 
PPI modulators it might be necessary to apply soft in silico 
ADME-Tox filters to prepare the collection. For example, 
chemical groups that could react with a protein to form 
a covalent bond are usually not welcome in a drug discov- 
ery program, yet this could be useful when probing a PPI 
like in the case of inhibitors of the thyroid hormone recep- 
tor and co-regulator proteins. 12501 

While all virtual screening approaches can be used for 
PPIs, some like pharmacophore derived from protein-pro- 
tein interfaces 12451 seems well-suited. Other tools like dock- 
ing-scoring can be used although they have not been de- 
signed to target PPI pockets (the docking step can be af- 
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fected by the lack of well defined binding cavity and the 
scoring step is always sensitive). In a recent study per- 
formed over several ligand-docking engines, it was found 
that good docking solutions could be obtained using con- 
ventional docking-scoring tools (yet a drop of about 10% 
has been noticed as compared to regular pockets' 2511 ), sug- 
gesting that structure-based screening can already assist 
the design of PPI modulators although additional methodo- 
logical developments are required. 12521 



4 PPI Low Molecular Weight Hit Discovery by 
Combining In Silico and In Vitro Approaches 

Modulating PPIs with a small molecule could be beneficial 
in many cases, even more so if the small molecule can be 
administered orally.' 12,19,2531 First, this could open new 
avenue for therapeutic intervention (see some recent re- 
views discussing the combination of in silico, in vitro and in 
some case in vivo studies to assist the design of hits and or 
clinical candidates or even PPI drugs such 
as [18,22,23,29,171,178,181,183] ). Further, as discussed in recent re- 
views, designing catalytic site inhibitors can be limited by 
high structural similarity among enzymes of the same 
family whereas the greater structural variability of protein- 
protein interfaces may provide a real opportunity for selec- 
tivity. Also, PPI modulators could be less prone to drug re- 
sistance than catalytic site inhibitors. In addition, even if 
two proteins bind with high affinity, they could be success- 
fully out-competed by a weak small molecule binder. 
Indeed, in some cases, the mere alteration of a binding 
equilibrium will be sufficient to produce a significant bio- 
logical effect without the need to completely inhibit the se- 
lected PPI. Several drugs (which target PPIs) are already in 
clinical use such as tirofiban (mainly orthosteric) targets in- 
tegrins (glycoprotein (GP) llb/llla receptors on platelets) to 
treat cardiovascular disease; maraviroc (mainly allosteric) in- 
terferes with the HIV gp120 interaction with the CCR5 re- 
ceptor and blocks HIV viral entry (see for instance the litera- 
ture' 231 ). At least 30 molecules are in preclinical or clinical 
stages' 1851 and target systems such as the IL2 and IL2R, Bcl2 
and Bcl-XL, HDM2 and p53, the E2 viral transcription factor 
and the viral helicase El, the ZipA membrane anchored 
protein and the FtsZ tubulin, and the TNF (tumor necrosis 
factor) trimers. These pioneered PPI systems have been dis- 
cussed extensively in numerous recent reviews and we will 
thus focus here on two new complexes. 



4.1 Inhibition of the VEGF-VEGFR Interaction 

Modulation of PPI by small molecules has been applied in 
several therapeutic areas and given the pivotal roles of PPIs 
in many processes relevant to malignant development, the 
concept has been used very actively in the field of 
cancer.' 107,253,2541 Among the many PPIs important in cancer 
is the vascular endothelial growth factor (VEGF)-VEGF re- 



ceptor (VEGFR) signaling pathway. Vascular endothelial 
growth factor (VEGF) plays a key role in angiogenesis, one 
of the hallmarks of cancer.' 2551 VEGF binds to several recep- 
tors including two major tyrosine kinase receptors (TKR), 
VEGFR-1 and VEGFR-2, on the surface of endothelial cells, 
thereby activating signal transduction and regulating both 
physiological and pathological angiogenesis. Whereas 
VEGFR-1 has been shown to stimulate endothelial cells mi- 
gration.' 2561 , VEGFR-2 is known to be a main initiator of sig- 
naling pathways in endothelial cells.' 2571 The VEGF-VEGFR 
system is a validated and promising target for anti-angio- 
genic treatments. Although the VEGF-VEGFR interface was 
found to be one of the flattest protein-protein interfaces 
available in the investigated dataset (the interface is rela- 
tively large >800 A 2 with a planarity value of 1.7 A' 431 ), well 
below those of most transient protein-protein complexes 
(mean planarity value = 2.7 A,' 2581 successful structure-based 
in silico screening was performed by targeting the VEGF- 
binding zone of the extracellular domain D2 of VEGFR-1. As 
flexibility is known to be important at protein-protein in- 
terfaces, the DFprot server was used to investigate the pos- 
sible plasticity of the D2 domain of VEGFR-1.' 2591 Analysis of 
the X-ray structure (with the probe mapping algorithm Pro- 
toMol implemented in the screening package Surflex and 
with LigBuilder) and of the simulation suggested that this 
region of the D2 domain was essentially rigid and as such 
docking experiments were performed on only one 3D 
structure of the D2 domain. Then, 8000 proprietary drug- 
like molecules (a subset of the French National Compound 
Collection) were docked with Surflex 12601 onto the predicted 
binding pockets of the target (Figure 8). 

After the in silico analysis, 206 compounds were selected 
for in vitro assays. Twenty compounds inhibiting the forma- 
tion of the VEGF-VEGFR complex in the micromolar range 
were identified. The bioactive molecules contained a (3-car- 
boxy-2-ureido)thiophen unit and the best /C 50 was -10 (iM. 
Moreover, the most potent compound (compound ID 4321) 
decreased the auto-phosphorylation of VEGFR-1 induced 
by VEGF, inhibited HUVE cells capillary formation and dis- 
rupted the actin and tubulin networks. These findings sug- 
gest that the best hit could be a promising scaffold to 
probe this macromolecular complex and used as a starting 
point to develop new treatments of diseases linked to 
VEGFR-1. 



4.2 Protein Protein Interaction Inhibition Involving the 
Anticoagulant Protein C 

The blood coagulation pathway comprises a series of effi- 
cient enzyme-cofactor complexes assembled on the surface 
of negatively charged phospholipids that are exposed on 
activated cells at sites of vascular damage. Activation of the 
pathway results in the generation of high concentrations of 
thrombin, which clots the blood. Several anticoagulant 
mechanisms control the coagulation pathway and under 
normal conditions the systems are balanced and bleeding 
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Figure 8. In silico-in vitro screening of the VEGF-VEGFR complex. The crystal structure of residues 8 to 109 of VEGF (cartoon diagram) in 
complex with VEGFR-1 D2 domain (shown here in solid surface representation; yellow, hydrophobic/aromatic; red, oxygen atom and/or 
negatively charged; blue, nitrogen atom and/or positively charged) is shown. A probe-mapping algorithm was used to analyze the inter- 
face area (green sphere highlights regions where carbon atoms can bind with reasonable affinity, blue spheres represent nitrogen atoms 
and red spheres, carbonyl groups). Three pockets (A, B, C) could be identified (dashed circles) and are surround by dashed circles. Struc- 
ture-based virtual screening was carried out over this entire zone and 20 molecules were identified after in vitro studies. The best com- 
pound binds directly to the VEGFR-1 D2 domain and inhibits protein-protein interaction. 



and thrombosis are avoided. 1261 " 2641 Inherited and acquired 
conditions can tip the pro- anti-coagulant balance resulting 
in bleeding or thrombosis. The therapeutic principle used 
for treatment of bleeding disorders such as hemophilia is 
to supplement the missing coagulation factor, whereas in- 
hibition of coagulation factors is the dominating approach 
for treatment of thrombosis. An alternative approach for 
treatment of hemophilia could be to inhibit the anticoagu- 
lant pathways. This could decrease the demand for re- 
combinant factor concentrates (e.g. FVIII) and be also bene- 
ficial for the treatment of hemophilia patients with inhibito- 
ry antibodies. Protein C circulates as a vitamin K-dependent 
zymogen serine protease that is activated to an active 
form, activated protein C (APC), by the thrombin-thrombo- 
modulin complex on the surface of endothelial cells. APC 
has multiple substrates cleaving at several positions in both 
coagulation cofactors FVa and FVIIIa and in addition APC 
cleaves also the membrane-bound PARI receptor. The 
cleavage of PARI on endothelium results in cyto protective 
effects. Thus, APC, is a key component of the protein C an- 
ticoagulant pathway and a key regulator of the coagulation 
cascade. A point mutation in the FV gene (FV Leiden) re- 
sulting in the APC-resistance phenotype due to the replace- 
ment of Arg506 with Gin is a highly prevalent thrombophil- 
ic risk factor. 1261 " 2631 The observation that hemophilia pa- 
tients carrying this mutation have a milder bleeding ten- 
dency suggests that inhibition of APC could potentially al- 
leviate the bleeding tendency in hemophilia patients. We 



have used a structure-based virtual screening approach to 
discover drug-like molecules that bind to an exosite of APC 
(the catalytic site should remain functional as much as pos- 
sible to carry out the cytoprotective effect) and inhibit the 
interaction between APC and its substrate FVa. [441 Such mol- 
ecules could potentially be developed into drugs to treat 
bleeding disorders. The experimentally determined 3D 
structure of APC was used and druggable binding pockets 
were search using several different in silico tools (FTsite, 
DoGSiteScorer and MetaPocket which combines 8 predic- 
tors: LIGSITEcs, PASS, Q-SiteFinder, SURFNET, Fpocket, 
GHECOM, ConCavity and POCASA). Potentially interesting 
sites on APC were identified in one exosite located next to 
the active site (Figure 9). 

Structure-based screening (with the package Surflex) of 
50000 compounds (ChemBridge Diversity set) resulted in 
the identification of 624 compounds that were then experi- 
mentally tested. The ability of these compounds to inhibit 
the degradation of FVa by APC was used as mean to fur- 
ther select the most potent compounds. After several re- 
peated rounds of testing, the best 20 compounds were 
tested for direct binding to APC using surface plasmon res- 
onance (SPR). To verify that the compounds specifically 
bound to the targeted exosites, we took advantage of 
available recombinant APC variants (i.e., mutations in the 
exosite) in the SPR analysis. The majority of compounds in- 
fluenced cleavages in FVa. It remains to investigate whether 
the compounds affected the degradation of FVIIIa and the 
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Figure 9. Inhibition of the APC-FVa interaction. A schematic diagram represents the anticoagulant activated protein C (APC) (left). APC is 
composed of a Gla domain allowing interactions with the appropriate cell membranes, two EGF-like modules and a serine protease (SP) 
domain. Such organization positions the active site far above the membrane, in the right location to interact with its substrates and part- 
ner proteins. A diversity set containing 50000 molecules was docked in an exosite of APC (right, top) and a possible pose for one active 
compound is shown (right, bottom). The position of this docked molecule seems reasonable as the compound was not binding properly 
a mutant protein C that had mutations in the exosite area. This exosite region is known to be important for interacting with the blood co- 
agulation cofactor Va. 



cleavage of PARI. At this stage, these molecules bind to 
APC with K d values in the range of 1(r 3 -1(r 5 M and will 
clearly require optimization. Yet, this work provides a first 
proof of principle that it may be possible to rationally 
design small molecules targeting the exosites of APC to 
achieve inhibition of the anticoagulant protein C system. 
The future will tell whether this strategy will be a useful ap- 
proach for the treatment of bleeding disorders. 



5 Summary and Outlook 

Currently, most small molecule drugs on the market (about 
70% of the about 3500 available drugs) hit only about 400 
to 500 targets (most of them being proteins) while func- 
tional genomics predicts about 3000 to 10000 disease 
modifying "traditional" proteins as potential targets (GPCRs, 
enzymes and ion channels... many traditional proteins are 
druggable but unfortunately many are of poor quality with 
respect to disease rationale). 1265,2661 As discussed in this 
review, ongoing investigations of PPI research teams 
should contribute to expand considerably the number of 
potential drug targets, much beyond the traditional 



ones. 187,2671 Small molecules can be used as chemical probes 
to explore biology and definitively, small compounds dedi- 
cated to PPIs will be of great interest to get new insights in 
the health and disease states. Indeed, most disease-modify- 
ing proteins exert their functions through interactions with 
other proteins. Although PPIs are essential for cellular func- 
tions, targeting such interactions with low molecular 
weight compounds (and if possible orally available mole- 
cules) was considered impossible during many years, but, 
fortunately, several research groups have challenged the 
dogma. As we gain knowledge about macromolecular com- 
plexes, about PPI networks, about the chemistry required 
to hit such targets, we expect to see more and more mod- 
ulators of PPIs entering clinical trials and most likely, new 
drugs acting on this target class will get approved in the 
coming years. We have also discussed in this review several 
in silico tools that can be used to assist the rational design 
of PPI modulators (a simple flowchart is provided 
Figure 10) and combined with in vitro-in vivo experiments. 
These in silico methods include PPI network analysis, struc- 
tural analysis and prediction of the interfaces, druggability 
predictions, rational design of focused compound collec- 
tions and various virtual screening computations. Drug re- 
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Figure 10. Structure-based design flowchart of orthosteric PPI inhibitors. One possible flowchart (among many) making use of in silico 
methods to assist the rational design of PPI modulators. 



positioning could also be applied to PPIs as illustrated by 
the discovery of raloxifene and bazedoxifene as novel in- 
hibitors of the IL-6-GP130 interface. 12681 With regard to drug 
discovery, clearly, some biological systems are going to be 
easier to address with low molecular weight compounds 
than others just like in the case of enzymes or of other tar- 
gets in general. The many ongoing in silico developments 
worldwide combined with the right in vitro-in vivo experi- 
ments, and many ongoing clinical studies should definitive- 
ly contribute to a more efficient and rational discovery of 
new types of PPI modulators against an ever-increasing 
number of protein-protein complexes in all therapeutic 
areas. 
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